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In this paper we examine a number of methods for probing and understanding the large-scale 
structure of networks that evolve over time. We focus in particular on citation networks, networks 
of references between documents such as papers, patents, or court cases. We describe three different 
methods of analysis, one based on an expectation-maximization algorithm, one based on modularity 
optimization, and one based on eigenvector centrality. Using the network of citations between opin- 
ions of the United States Supreme Court as an example, we demonstrate how each of these methods 
can reveal significant structural divisions in the network, and how, ultimately, the combination of 
all three can help us develop a coherent overall picture of the network's shape. 



I. INTRODUCTION 

The physics community has in recent years devoted 
considerable attention to the study of networks, includ- 
ing social networkSjbiological networks, information net- 
works, and others P, 0, Many of these networks also 
have long histories of study in other fields. Citation net- 
works, which are the principal focus of this paper, have 
been studied quantitatively almost from the moment ci- 
tation databases first became available, perhaps most fa- 
mously by the physicist-turned-science-historian Derek 
de Solla Price, who authored two celebrated papers in 
the 1960s and 1970s highlighting the power-law degree 
distributions in networks of scientificpapers and devel- 
oping models to explain their origin [J, |5| . 

A citation network is an information network in which 
the vertices represent documents of some kind and the 
edges between them represent citation of one document 
by another. Citation networks differ from other networks 
in a number of important ways. First, they are directed: 
citations go from one document to another and hence 
constitute an inherently asymmetric relationship between 
the vertices involved. Mathematically, the network can 
be represented by an adjacency matrix A, with elements 



A, 



1 if there is an edge from j to i, 
otherwise. 



(1) 



In a directed network the adjacency matrix is, in general, 
asymmetric. 

A second feature of citation networks is that they 
evolve over time as new documents are created. The 
time evolution of the network takes a special form, in 
that vertices and edges arc added to the network at a 
specific time and cannot be removed later. This perma- 
nence of vertices and edges means that the structure of 
the network is mostly static: it changes only at the "lead- 
ing edge" of the network, the current time at which new 
documents are being added. Citation networks differ in 
this respect from other information networks such as the 
world wide web, in which vertices and edges can be re- 
moved as well as added and edges can be repositioned 
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FIG. 1: Citations run from vertices created at later times to 
those created at earlier times — in the opposite direction to 
"arrow of time." 



after they are added. The limited form of time evolution 
found in citation networks makes them, in some ways, a 
simpler and cleaner laboratory for the study of network 
growth than the web. 

The combination of the two features of citation net- 
works described above leads to a third: citation networks 
are acyclic, meaning there are no closed loops of citations 
of the form A cites B cites C cites A, or longer. When 
a new vertex is added to a citation network it can cite 
any of the previously existing vertices, but it cannot cite 
vertices that have not yet been created. This gives the 
network a clear "arrow of time," with all edges pointing 
backwards in time as shown in Fig. [TJ As a result it is 
typically possible, starting from a given vertex, to find 
a path of citations that takes us back in time through 
the network, but it is not possible to find one that takes 
us forward again, so that no closed loops exist. (Real 
citation networks arc often not perfectly acyclic. For ex- 
ample, a scientific paper can sometimes cite work that is 
forthcoming but not yet published, resulting in a closed 
loop in the network. However, such loops are rare and 
necessarily short, being limited by the narrow span of 
time over which it is possible to predict future publica- 
tions. In practice, therefore, it is usually a good approx- 
imation to assume the network to be acyclic.) 

Citation networks arise in a variety of different areas. 
We have mentioned networks of scientific citations, which 
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have been studied by many authors since the classic work 
of Price mentioned above. (See, for instance, the book 
by Egghe and Rousseau or any volume of the journal 
Scientometrics, which is entirely devoted to the quanti- 
tative analysis of scholarly authorship and citation pat- 
terns.) Citation networks of patents have, to a lesser 
extent, also been studied. Patents cite other patents for 
a variety of reasons, but most often to establish their 
originality and distinction from previous work. Exten- 
sive data on patent citations have become available in 
recent years, allowing the construction of very large ci- 
tation networks 0, [l] . Very recently, there has also been 
interest in legal citation networks, networks of legal opin- 
ions written by judges and others, which cite one another 
to establish precedent We make extensive use 

of one particular legal citation network, the network of 
opinions of the United States Supreme Court, as an ex- 
ample in this paper, although the techniques we will be 
considering are certainly applicable to other networks as 
well. 

Given the wide interest in and unique structure of ci- 
tation networks, it is instructive to investigate what can 
be learned from an analysis of the statistical patterns 
present in these networks. A variety of studies have been 
presented in the past focusing on relatively standard net- 
work measures such as degree distributions [HI, [13, ■ 
To investigate the time-dependent structure that is the 
special property of citation networks, however, other 
methods are needed. In this paper we present several 
techniques that, as we will show, are — both individ- 
ually and collectively — capable of revealing interesting 
new structure in these networks. 



II. A MIXTURE MODEL OF CITATION 
PATTERNS 

The first analysis we describe makes use of a stochastic 
mixture model of the citation process, which is fitted to 
the observed network data using the likelihood optimiza- 
tion technique known as the expectation-maximization 
algorithm. 

A crucial property affecting the structure of citation 
networks is the pattern over time of the citation of docu- 
ments following their publication. It is interesting for in- 
stance to ask if there are typical patterns that documents 
follow. Are there more citations immediately after publi- 
cation than later, or do they grow in frequency over time? 
Are documents more likely to cite recent precedents or 
older better-established ones? Do documents tend to cite 
others published during a particular time period? There 
could also be more than one common pattern with differ- 
ent documents following different patterns. If so, how can 
we determine those patterns, and how can we tell which 
pattern particular documents follow, given that citation 
data are inherently noisy? 

As an example, we consider the network of legal cita- 
tions between cases handed down by the Supreme Court 



of the United States, from its inception in 1789 until the 
present day. We will use this example throughout this 
paper; it is well documented, shows clear and interest- 
ing structural signatures, and has been studied much less 
than other types of citation networks in the past, so that, 
although we use the network primarily as an illustrative 
device, the results we derive are in many cases of interest 
in their own right and not just as a demonstration of our 
methods. 

Consider the following table, which gives the dates of 
the citations received so far by a single example opinion 
handed down by the Supreme Court in the year 1900: 



year 


cites 


year 


cites 


year 


cites 


1900 


1 


1907 


2 


1925 


1 


1901 


4 


1910 


1 


1936 


1 


1902 


3 


1912 


2 


1947 


1 


1904 


1 


1920 


1 







We will take citation profiles such as this as the basic 
inputs in our analysis. 

One interesting question (there are many) is whether 
there are distinct eras of citation in the history of this 
(or any) citation network. Are there, for instance, eras in 
which a certain set of documents are well cited, followed 
perhaps by another era or eras in which that set falls out 
of favor to be replaced by a different one? Many readers 
can probably think of anecdotal cases of behavior like 
this in scientific citation networks. Here we place these 
observations on a firm analytic foundation. 

We will attempt to divide the vertices in a citation 
network into groups by identifying similarities in their 
citation profiles. Our method will be to define a set of ci- 
tation profiles and then self-consistently assign each case 
to the profile it best fits while at the same time adjusting 
the shape of the profiles to best fit the cases assigned to 
them. The means by which we accomplish this task is 
the expectation-maximization (EM) algorithm [3, [lB|- 

The EM algorithm is an established tool of statistics, 
but one that is relatively new to network analysis. In a 
previous paper we described an application of the method 
to the classification of vertices in static networks, both 
directed and undirected [l6| . Here we describe a differ- 
ent application to the analysis of the temporal profiles of 
citations. 

In essence the EM algorithm is a method for fitting a 
model to observed data by likelihood maximization, but 
differs from the maximum likelihood methods most often 
encountered in the physics literature in that it does not 
rely upon Markov chain Monte Carlo sampling of model 
parameters. Instead, by judicious use of "hidden" vari- 
ables, the maximization is performed analytically, result- 
ing in a self-consistent solution for the best-fit parameters 
that can be evaluated using a relatively simple iteration 
scheme. 

Suppose we have a network of n vertices representing 
our documents and we believe that they can be divided 
into c groups, each of which is characterized by a partic- 
ular probability distribution of citations over time. (Ulti- 
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mately, we will vary c to find the best description for our 
data, but for the moment let us assume it to be fixed.) 
Our approach to finding the groups will be to fit the net- 
work to a model consisting of two parts: (1) a set of 
time profiles {6r{t)}, one for each group, such that Or(t) 
is the probability that a particular citation received by 
a document in group r is made during year t; (2) a set 
of probabilities tt^, such that tt^ is the probability that a 
randomly chosen document belongs to group r (i.e., tt^ is 
the expected fraction of documents belonging to group r) . 
We fit this model to the observed data by maximizing 
the probability of the observed set of citations given the 
model — the so-called likelihood function. 

Suppose that document i belongs to group gt and let 
Zi{t) be the number of citations that the document re- 
ceives in year t. Then the probability that document i 
received the particular citations it did and is in group gi, 
given the model parameters, is 

Pr{z„g,\n,e)^Piiz,\g,,7:,e)PTig,\7:,e), (2) 



where for convenience we use tt, 6 to denote the entire set 
{TTj.,dr}- Assuming random and uncorrelatcd citations 
drawn from the time profile 9g. (i), the terms on the right- 
hand side are given by 



over all possible assignments thus: 



PY{z,\g,,n,9)^h\ Y[ 



z,{t)\ 



gi ' 



(3) 
(4) 



where ki = X)t-^i(^) the in-degree of document i, 
i.e., the total number of citations it receives, and ti and 
t2 are the first and last years of data in our dataset. 

Now taking the product over all vertices, the likelihood 
of the entire data set is L = Y[i=i P^C-^^jj 9iWj In f^^ct, 
we will work with the logarithm C of the likelihood, which 
has its maximum in the same place: 



£ = In L = ^ [In Pr(5, |7r, 9) + In Py{z,\g,,n,9)]. (5) 
1=1 

Unfortunately, C depends on the group memberships gi, 
which we don't know. Given the observed citation 
patterns, however, we can make a good guess about 
the group memberships, or more precisely we can com- 
pute the probability distribution of their values, which 
in Bayesian fashion we regard as a statement about 
our knowledge of the world, rather than a statement 
about the actual values of the group memberships, which 
are in theory perfectly well-defined quantities. Writing 
the probability of a particular assignment of vertices to 
groups as Pr({(7i}|z, tt, 0), we can then calculate the ex- 
pected value of the log-likelihood as the average of Eq. Q 



C = 



91=1 

C 

E- 

51=1 



,. Pri{g^}\z,7r, 

9^ = 1 

c 



1=1 



[lnPr(5i|7r,6l) +lnPr(zi|g„7r, 



Y,Y.^<9^ = r\z,,n,9) 

2—1 r— 1 

X [in Pr(_g.i ~ r|7r, 

n 



\nPr{z,\g, = r,7T,9)] 



= ^ ^ gir I In TTr + hi h ! 4" 

i=l r=l 

1 

^[z,(t)ln0,(O-lnz,(t)!]|, (6) 
t=ti 

where we have introduced the shorthand notation 

Qir = Pr(5i = r\zi,TT, 9) (7) 

for the probability that vertex i belongs to group r, given 
the model and the observed citation pattern. 

This expected log-likelihood represents our best esti- 
mate of the value of the log-likelihood given what we 
know about the system. By maximizing it, we can now 
calculate a best estimate of the most likely values of 
the model parameters, a process that involves two steps: 
first, we estimate the group membership probabilities g^^; 
second, we use those probabilities in the maximization 
of C We take these steps in turn. 

To calculate the qtr we observe that 

Pr(zi,5i = r\TT,9) 



r Zi, TT, I 



(8) 



PT{z,\n,9) 

The two factors on the right can be determined by sum- 
ming Eq. (21) over the appropriate sets of variables and 
making use of Eqs. ([3]) and (|4]) to give 

E»»>n, [«*(()] 

Once we have this expression, we can use it to evaluate 
the log- likelihood, Eq. and hence to find the values 
of the model parameters that maximize the likelihood, 
which is our ultimate goal. The maximization is helped 
by the fact that tt^ and 9r enter Eq. ([6]) in independent 
terms. Considering tt^ first and noting that it must sat- 
isfy the normalization condition tt^ = 1, we introduce 
a Lagrange multiplier a and then differentiate, holding 
qir constant, to get 



0= a^^E*'-!^^-^ 



i-E-']} 



1 

= — E^'^ 



(10) 
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Rearranging this expression gives 

1 " 
TTr = — > qh 

m ^ — ^ 



(11) 



1=1 



The Lagrange multipher a is then fixed by the condition 
Yl,r ""r = 1 thus: 



V TT,. = 1 = - V q„. = -, 



(12) 



where we have made use of J2r lir = 1- Thus tt^ is given 
by 



TT^ = - V 

T7 ^— ^ 



(13) 



In other words, the prior probabihty of a vertex belonging 
to group r is just the average over all vertices of the 
conditional probability of belonging to group r. 

Similarly, the 9j. satisfy the normalization condition 
J2t Srit) = 1 for all r, so we introduce a set of c Lagrange 
multipliers {Pr} ^nd write 



d 



d9r{t) 



^Qir ^ Zi{t) \n0r{t) 



ir t—ti 



J2Pr 1-E^-W 



0. (14) 



Again holding qir constant and employing Eq. ([3]), we 
find 



Z^'j'fTrTTT - /3r = 0, 



9rit) 



%{t) = 



J2t qirZtit) 



(15) 



(16) 



where we have evaluated (3^. using the normalization con- 
dition and the fact that Zi{t) = ki by definition. 

To calculate the optimal values of the model parame- 
ters, as well as the group membership variables qir, we 
now need to solve Eq. ([9]) simultaneously with Eqs. (fT3|) 
and (|16p . The simplest way to do this is numerical it- 
eration. Starting from an initial guess about the values 
of {TTr, 9r{t)}, wc evaluate Eq. ([9]) and then use the results 
to make an improved estimate of the model parameters 
from Eqs. and (fTB|) . Under reasonable conditions 
this process is known to converge upon iteration to a 
self-consistent solution. 



A. Example 

As a demonstration of the EM method we have ap- 
plied it to the citation network of Supreme Court cases 




1800 1850 1900 

Year of decision 



1950 



2000 



FIG. 2: Results of the application of the EM analysis with 
c = 2 to the network of citations between Supreme Court 
opinions. The two curves show the fraction of cases assigned 
to each of the two groups found, as a function of time. 



described in Section HT] Applied to this network, the al- 
gorithm will divide the network into any requested num- 
ber c of groups, such that each group is characterized by 
a distinctive pattern of citations to cases in that group. 
We have performed the analysis for a variety of differ- 
ent values of c. We begin with the simplest case, c = 2, 
of division into two groups. Starting with random ini- 
tial values for {iir^dr} and applying the EM iteration, 
Eqs. (O, and the parameters rapidly converge 
to a clear split of the network into two groups. Figure [5] 
shows the fraction of cases assigned by the algorithm to 
each of the groups as a function of time. Cases are as- 
signed in proportion to their probability of membership 
in each of the groups so that, for instance, a case belong- 
ing to group 1 with probability 0.7 and to group 2 with 
probability 0.3 contributes 0.7 of a case to the first group 
and 0.3 of a case to the second. 

Figure [2] reveals a dramatic split between the two 
groups: the best fit, in the maximum likelihood sense, 
of the mixture model with two groups to these data 
produces one group containing practically all cases be- 
fore 1937 and another containing practically all cases af- 
ter. This breakpoint coincides with a significant consti- 
tutional crisis for the Supreme Court. For the interested 
reader we give some further analysis in Section |Vl 

The EM algorithm tells us in this case that the 
Supreme Court's rulings split quite cleanly into groups 
with distinct citation profiles. That is, the opinions of 
the court can be distinguished sharply by the cases that 
later cited them. The citation profiles themselves, mean- 
ing the temporal citation patterns represented by the pa- 
rameters {9,.} in the model, are shown in Fig. [31 As we 
can see, they also divide into two time periods, which 
correspond closely to those of the group memberships 
depicted in Fig. [21 This implies that the opinions that 
cite cases in each of our groups were handed down dur- 
ing roughly the same eras as the cited cases. This is 
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• group 1 of 2 
■ group 2 of 2 



1800 1850 1900 1950 2000 

Year of citation 

FIG. 3: The citation profiles 9r{t) generated by the EM algo- 
rithm with c = 2 for the Supreme Court citation network. 
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FIG. 4: Results of the application of the EM analysis with 
c = 4 to the network of citations between Supreme Court 
opinions. 



not surprising if one assumes that the group divisions 
reflect different legal ideologies, but it is important to 
bear in mind that our analysis does not require it: it 
would be perfectly possible to detect groups that were 
distinguished by citations received during some entirely 
different era of the court arbitrarily later in its history, 
or even in no era at all but scattered widely over time. 

We can also ask about best fits to the model for num- 
bers of groups c greater than two. It is always the case 
that larger values of c will give better fits to the data, 
since larger values give us more parameters to fit with, 
but we must be wary of overfitting. In practice, we have 
been able to extract useful formation about networks by 
comparing the results for a variety of small values of c. 
Rigorous methods for deciding optimal values of c, such 
as minimum description length, methods based on ap- 
proximations to the marginal likelihood, or information 
theoretic measures have been developed for other applica- 
tions of the EM algorithm and we discuss these 
approaches elsewhere. For the moment we simply de- 
scribe the results for various values of c. 

Figure [?] shows results for the Supreme Court network 
with c = 4. The method again finds clear groups of 
cases, and as in the c = 2 case they are strongly delin- 
eated according to the dates of the opinions and thus 
appear to offer evidence for the presence of distinct eras 
in the court's history. In particular, the analysis finds 
a clear grouping of cases between 1897 and 1937, corre- 
sponding approximately to the so-called Lochner era of 
Supreme Court jurisprudence, the significance of which 
is described in Section IVl 

In these analyses we have characterized our documents 
by the pattern of citations they receive. However, one can 
equally well look at the pattern of citations that docu- 
ments make and this also, at least in some cases, can be 
a useful cue for detecting patterns in the network. The 
EM algorithm can be applied to this analysis as well. The 
developments are identical and the same computer code 
can be used — one simply takes the transpose of the ad- 
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Year of dectsion 



FIG. 5: Results of the application of the EM algorithm with 
c = 4 to data for citations made (rather than received) by 
opinions in our Supreme Court dataset. The groups found 
are quite similar to those for the analysis based on citations 
received. 



jacency matrix. Figure [51 for example, shows the results 
of the application of the method to citations made by 
the opinions in our Supreme Court dataset, with c = 4. 
As the figure shows, the results are remarkably similar 
to those for citations received: it appears that, in this 
case at least, there is a high degree of agreement about 
how cases should be classified into eras. This could indi- 
cate agreement between the opinions' writers and those 
that came after them, about the position staked out by 
individual opinions within the larger body of literature 
represented in our data set. 
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III. CLUSTERING IN CITATION NETWORKS 

The general problem of the division of networks into 
groups of related vertices has been extensively studied in 
the past. The classic problem of "clustering" or "com- 
munity detection" is to find groups of vertices within net- 
works that have a higher than average density of internal 
edges and relatively few connections to the rest of the 
network [l^, . The second analysis technique we in- 
vestigate for citation networks is a clustering method of 
this kind. As we will see, it is instructive to compare the 
results with those of our EM analysis in the previous sec- 
tion. The two methods do not do the same thing: the EM 
analysis groups together vertices that have similar time 
profiles to their citations, while the community analysis 
groups together vertices that are specifically linked to one 
another by edges. Nonetheless, as we will show, the two 
approaches can produce similar outcomes, for instance in 
the example of the Supreme Court data set. 

Considerable effort has been devoted to the develop- 
ment of methods to find community structure within net- 
works. The authors arc aware of dozens of different meth- 
ods (at least) published within the last few years. Here 
we make use a method recently proposed by Newman [2lj 
based on the maximization of the benefit function known 
as "modularity." Although many competing methods ap- 
pear to give excellent results, we focus on this particular 
method for two reasons: first, it is based on firm statisti- 
cal principles that make its operation transparent to the 
user; second, it has been shown in recent head-to-hcad 
comparisons to give better results on standardized tests 
than competing methods [20 |. 

Briefly the method works as follows. Given a network 
and a particular division of the vertices of that network 
into nonovcrlapping groups or communities, the modu- 
larity is defined as the number of edges that lie within 
those groups minus the expected number of such edges 
if edges are placed at random between the vertices (but 



respecting vertex degree) [22|. In essence, the modular- 



ity measures whether a larger than expected number of 
edges fall within the groups defined. In principle, the task 
of finding the best division of the network into groups is 
then one of maximizing the modularity over all possible 
divisions [2^. In practice, this maximization problem is 
known to be NP-complete [l^l, so approximate solution 
methods must be used for all but the smallest networks. 
Newman's method works by rewriting the modularity in 
the language of linear algebra as a quadratic form involv- 
ing an index vector and a characteristic matrix dubbed 
the "modularity matrix." It can then be shown that the 
signs of the elements of the leading eigenvector of this 
modularity matrix give an approximation to the division 
of the network into two parts that maximizes the mod- 
ularity. This approximate maximum can optionally be 
further refined by, for instance, applying a greedy algo- 
rithm that moves vertices between groups as described 
in [2l| . By repeatedly dividing the network in two in 
this way, a network can be divided into any number of 
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FIG. 6: A histogram of the number of decisions versus the 
year of the decision for cases assigned to each group in the 
two-way split produced by the modularity maximization al- 
gorithm. 



communities, although typically one stops dividing when 
no divisions exist that will increase the modularity any 
further. 

This repeated subdivision of the network into smaller 
and smaller groups is particularly attractive for the pur- 
poses of our present analysis, because it allows us to ob- 
serve the major divisions in the network first, followed by 
more minor ones, and to stop the process at any point to 
compare with our other analyses. A limitation of the 
method is that it is designed for use with undirected 
rather than directed networks. This however is not a 
great hindrance. It seems reasonable to consider edges 
in a citation network to be a sign of connection between 
documents, and that connection exists regardless of the 
direction the edge runs in. So we simply ignore the di- 
rections in our analysis and apply the eigenvector cal- 
culation to the undirected network. This approach has 
been taken before by other authors and appears to work 
well — see, for example, Ref. [25l |. 

We can visualize the results of our clustering analysis 
in a manner similar to our visualizations of the output 
of the EM algorithm, as a histogram over time. The re- 
sults for the leading split of the Supreme Court network 
into two clusters are depicted in this way in Fig. [6l The 
results are similar to those for the EM algorithm, with 
a significant break around 1937. This appears to bol- 
ster the conclusions of our EM analysis, that there have 
been separate periods in the court's history that leave 
identifiable signatures in the citation record. There are 
some differences between the two sets of results, partic- 
ularly the early "tail" to the second group in the clus- 
tering analysis and an overall difference in the number 
of cases assigned to each group. A possible explanation 
for these differences is that the EM analysis makes use 
only of citations received by cases, whereas the clustering 
analysis, which ignores edge direction, takes into account 
both citations received and citations made. This allows 
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FIG. 7: A histogram of the number of decisions versus the 
year of the decision for cases assigned to each group in the 
four-way spht produced by the modularity maximization al- 
gorithm. 



the classification into groups of some vertices that were 
unclassifiable with the EM algorithm by virtue of never 
receiving any citations. (About 10% of cases were never 
cited.) It could also be responsible for the tail in the sec- 
ond group because citations made, which are necessarily 
to cases in the past, connect vertices to earlier times, per- 
haps pulling them from the second group into the first in 
the clustering analysis. 

As with the EM analysis, we can go further and look at 
spHts into larger numbers of groups. For instance. Fig. [7| 
shows the best split into four groups according to the 
modularity-based approach. Again the split is similar in 
overall form to the split found by the EM algorithm with 
c = 4, although the results are not as clean as those for 
the EM algorithm. As before, a new split point appears 
around 1900, which could be associated with the start of 
the Lochner era. 



IV. VERTEX AUTHORITY SCORE AND TIME 
EVOLUTION 

For our third analysis, we turn away from studies of 
groups or clusters and focus on another class of network 
measures: centrality scores, which quantify the impor- 
tance or influence of individual vertices in a network. As 
we will see, the pattern of centrality scores as a func- 
tion of time in our evolving citation networks can reveal 
interesting patterns. 

The simplest of centrality scores is the degree of a ver- 
tex. In a directed network such as a citation network, 
there are two degrees, the in-degree and the out-degree. 
It is reasonable, for instance, to imagine that important 
or influential vertices in a citation network will receive 
many citations and therefore have high in-degrec. A 
more sophisticated versions of the same idea is eigenvec- 
tor centrality [26j . in which, rather than merely counting 



the number of citations a vertex gets, we award a higher 
score when the citing vertices are themselves influential. 
The simplest way to do this is to define the centrality to 
be proportional to the sum of the centralities of the cit- 
ing vertices, which makes the centralities proportional to 
the elements of the leading eigenvector of the adjacency 
matrix. Unfortunately, this method docs not work for 
acyclic directed networks, such as citation networks, for 
which all such centralities turn out to be zero. 

An interesting variant of eigenvector centrality has 
been proposed by Kleinberg [27[ that works well for 
acyclic networks. In this variant each vertex has two 
centralities, known as the authority score and the hub 
score, the first derived from the incoming links and the 
second from the outgoing links. In this view a "hub" is 
a vertex that points to many important authorities — a 
review paper in a citation network, for instance — while 
an authority is a vertex pointed to by many important 
hubs — such as an important or authoritative research ar- 
ticle on a particular subject. In the simplest version of 
the method the authority score Xi of vertex i is simply 
proportional to the sum of the hub scores yj of the ver- 
tices citing it: 



(17) 



for some constant A, while the hub score is proportional 
to the sum of the authority scores of the vertices it cites: 



2/. = ^E^. 



(18) 



In matrix form, these equations can be written 

Ay = Ax, A'^x = fiy. (19) 
Or, eliminating cither x or y. 



AA x = A^x, 
A'^Ay = A^y. 



(20) 
(21) 



Thus x and y are eigenvectors of the symmetric matrices 
AA"^ and A"^A (also known as the cocitation and bibli- 
ographic coupling matrices respectively). In Kleinberg's 
formulation of the problem one focuses on the leading 
eigenvector of each of the matrices, although in princi- 
ple there could be useful information to be gleaned from 
other eigenvectors too. 

Taking the Supreme Court network as an example 
again, we have applied this method to the calculation 
of authority scores for cases in the network. It proves 
particularly revealing to look at the scores as a func- 
tion of time. That is, we take the network as it existed 
at some time t (discarding all cases published after that 
time) and calculate a complete set of authority scores 
for all vertices. We concern ourselves primarily with the 
most central cases, those with the highest scores. Fig- 
ure [5] shows one particularly revealing statistic, the av- 
erage age of the ten highest-ranked cases for each year 
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FIG. 8: The average age of the highest-authority cases in the 
Supreme Court citation network as a function of time. 



in our data set as a function of year. As the plot shows, 
there is a marked trend for the average age to increase 
in step with the passage of time. This is precisely the 
behavior one would expect if the top authorities in the 
network are remaining the same as time goes by. Ev- 
ery once in a while, however, the plot shows a sudden 
and precipitous drop in the average age, indicating that 
a much younger set of vertices have, in a short space 
of time, taken over as the new leaders in the authority 
score rankings. Thus the plot indicates a repeated pat- 
tern in the evolution of the network in which a certain 
set of vertices — certain cases considered by the Supreme 
Court — remain the top authorities for substantial peri- 
ods of time before being swiftly replaced by a different 
set. One example of such a turnover can be seen in Fig.[H] 
around 1900 and a smaller one around 1940, dates that, 
as we have seen, correspond roughly to the beginning and 
end of the Lochner era. Another very large dip in the 
curve occurs around 1970. (Our four-group EM analysis 
also found a group division at approximately the same 
time — see Fig. SI) The large size of this dip may be 
due in part to the much larger number of cases decided 
per year by the Supreme Court in more recent decades 
than in its earlier history, which makes it easier for newly 
appearing cases to quickly become top authorities. The 
results of the centrality analysis are thus compatible with 
but different from those of previous sections. Such vari- 
ations are one reason why a variety of different analytic 
techniques are useful in studies of network structure. 

The behavior described is clearest in the age of the top 
ten vertices, but persists if a different number is used. 
Figure [8] shows the results of the same calculation for 
the top 50, 100, and 500 authorities, and in each case a 
similar pattern of maturation followed by swift renewal 
is visible. 



V. DISCUSSION 

Although the purpose of this paper is primarily to 
highlight new methods for the analysis of network data, 
the ultimate goal of these methods is of course to give re- 
searchers insight into the structure and meaning of their 
data. Thus it is interesting to ask whether the analy- 
ses described here do in fact shed light on the system 
studied — in this case, the network of citations between 
Supreme Court cases. In fact the results do appear to 
shed interesting new light on the workings of the Supreme 
Court; we give a short explanation of our arguments in 
this section. 

The United States underwent a transition from an agri- 
cultural economy to an industrial economy in the latter 
part of the nineteenth century. Federal and state leg- 
islators adapted to the new economic environment by 
passing laws that regulated emerging industries. These 
regulations, however, were not without opposition from 
those who preferred a laissez-faire or hands-off approach. 
Among those outspoken in opposition were several mem- 
bers of the Supreme Court and, beginning in 1897, the 
court began invalidating a number of cases that imposed 
regulations on industry and business, starting with All- 
geyer v. Louisiana. The legal doctrines of substantive due 
process and freedom of contract were merged together 
into a significant limitation on the police power of the 
state. After AUgeyer, any statute, ordinance, or admin- 
istrative act that imposed any kind of limitation upon the 
right of private property or freedom of contract became 
suspect, even if the regulation was intended to promote 
safety and general welfare [1^ . 

The most famous (or infamous) of the cases to use sub- 
stantive due process to invalidate state regulation was 
Lochner v. New York in 1905, a case that became so no- 
torious that this entire era of jurisprudence, between 1897 
and 1937, came to be known as the Lochner era. During 
the Lochner era the Supreme Court struck down nearly 
200 regulations [1^. The Lochner era is clearly visible, 
for example, in our EM analysis with c = 4 (Fig. 2]) — the 
analysis picks out one group of cases with start and end 
dates that correspond closely to the accepted dates of the 
era. 

Ultimately, the Supreme Court's hostility to state and 
federal regulation began to interfere with the "New Deal" 
programs instituted by US President Franklin Roosevelt 
to combat the Great Depression. Between 1934 and 1936, 
the court invalidated more federal statutes than during 
any other two-year period in its history and by 1936 
nearly all of the statutes passed as part of the New Deal 
had been struck down. In response, Roosevelt launched 
in early 1937 a counteroffensive against the Supreme 
Court in which he proposed to appoint to the court up to 
six additional justices more receptive to the New Deal. 
This "court packing" plan was, to say the least, highly 
controversial, but Roosevelt had the support of signifi- 
cant majorities in both houses of Congress, and the na- 
tion as a whole, still in the throes of the depression, was 
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eager for something new. 

Following Roosevelt's proposal, the court abruptly re- 
versed course and, beginning in March of 1937, validated 
a series of state and federal measures. Contemporary 
commentators have humorously dubbed this change the 
"switch in time that saved nine," but whether the switch 
was substantive or illusory has been the subject of much 
debate. Some scholars believe that the court responded 
to political pressure, while others have suggested that 
the court already contained a majority of justices who 
would have been inclined to sustain the New Deal if leg- 
islation had been drafted better or if certain unanswered 
questions had been appropriately posed to the court. 

Our EM analysis shows a clear break around 1937, cor- 
responding closely to the end of the Lochner era. It is 
important to appreciate that the analysis takes into ac- 
count only citations received by cases, and thus that the 
opinions of the Supreme Court appear to have taken a 
substantial change of direction not merely in impact but 
also in their arguments: later cases cited the new opinions 
rather than those coming before them because, presum- 
ably, their arguments better supported the decisions of 
the post-1937 court. Thus our analysis appears to in- 
dicate not merely a change in case outcomes that was 
a natural, if novel, result of positions long held by the 
sitting justices, but a more fundamental change in legal 
thinking itself — or at least its expression in the written 
opinions of the court and the later citation of those opin- 
ions. 



VI. CONCLUSIONS 

In this paper we have described several methods for the 
analysis of citation networks, which are acyclic directed 



graphs of citations between documents. Using the net- 
work of citations between opinions handed down by the 
US Supreme Court as an example, we have described and 
demonstrated three analysis techniques. The first makes 
use of a probabilistic mixture model fitted to the observed 
network structure using an expectation-maximization al- 
gorithm. The second is a network clustering method 
making use of the recently introduced method of mod- 
ularity maximization. The third is an analysis of the 
patterns of time variation in eigenvector centrality scores, 
particularly the "authority" score introduced by Klein- 
berg [13. 

When applied to the Supreme Court network, each of 
these analyses reveals interesting structure, particularly 
highlighting qualitative changes in citation patterns that 
may be associated with specific eras of legal thought in 
the Supreme Court. However, it is in combination that 
the methods become most effective. Features that appear 
clearly in analyses performed using several different tech- 
niques possess correspondingly greater persuasive force. 
In the case of the Supreme Court, there emerges quite a 
clear picture of the eras of the court as marked by shifts 
in citation patterns, particularly around the time of the 
so-called Lochner era in the early 20th century. 
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